372 research outputs found

    How Does Imperfect Automatic Indexing Affect Semantic Search Performance?

    Full text link
    Documents in the health domain are often annotated with semantic concepts (i.e., terms) from controlled vocabularies. As the volume of these documents gets large, the annotation work is increasingly done by algorithms. Compared to humans, automatic indexing algorithms are imperfect and may assign wrong terms to documents, which affect subsequent search tasks where queries contain these terms. In this work, we aim to understand the performance impact of using imperfectly assigned terms in Boolean semantic searches. We used MeSH terms and biomedical literature search as a case study. We implemented multiple automatic indexing algorithms on real-world Boolean queries that consist of MeSH terms, and found that (1) probabilistic logic can handle inaccurately assigned terms better than traditional Boolean logic, (2) query-level performance is mostly limited by lowest-performing terms in a query, and (3) mixing a small amount of human indexing with automatic indexing can regain excellent query-level performance. These findings provide important implications for future work on automatic indexing.Comment: 9 pages, 4 figures, HealthNLP 202

    Visual Analysis of High-Dimensional Event Sequence Data via Dynamic Hierarchical Aggregation

    Full text link
    Temporal event data are collected across a broad range of domains, and a variety of visual analytics techniques have been developed to empower analysts working with this form of data. These techniques generally display aggregate statistics computed over sets of event sequences that share common patterns. Such techniques are often hindered, however, by the high-dimensionality of many real-world event sequence datasets because the large number of distinct event types within such data prevents effective aggregation. A common coping strategy for this challenge is to group event types together as a pre-process, prior to visualization, so that each group can be represented within an analysis as a single event type. However, computing these event groupings as a pre-process also places significant constraints on the analysis. This paper presents a dynamic hierarchical aggregation technique that leverages a predefined hierarchy of dimensions to computationally quantify the informativeness of alternative levels of grouping within the hierarchy at runtime. This allows users to dynamically explore the hierarchy to select the most appropriate level of grouping to use at any individual step within an analysis. Key contributions include an algorithm for interactively determining the most informative set of event groupings from within a large-scale hierarchy of event types, and a scatter-plus-focus visualization that supports interactive hierarchical exploration. While these contributions are generalizable to other types of problems, we apply them to high-dimensional event sequence analysis using large-scale event type hierarchies from the medical domain. We describe their use within a medical cohort analysis tool called Cadence, demonstrate an example in which the proposed technique supports better views of event sequence data, and report findings from domain expert interviews.Comment: To Appear in IEEE Transactions on Visualization and Computer Graphics (TVCG), Volume 26 Issue 1, 2020. Also part of proceedings for IEEE VAST 201

    Selection Bias Tracking and Detailed Subset Comparison for High-Dimensional Data

    Full text link
    The collection of large, complex datasets has become common across a wide variety of domains. Visual analytics tools increasingly play a key role in exploring and answering complex questions about these large datasets. However, many visualizations are not designed to concurrently visualize the large number of dimensions present in complex datasets (e.g. tens of thousands of distinct codes in an electronic health record system). This fact, combined with the ability of many visual analytics systems to enable rapid, ad-hoc specification of groups, or cohorts, of individuals based on a small subset of visualized dimensions, leads to the possibility of introducing selection bias--when the user creates a cohort based on a specified set of dimensions, differences across many other unseen dimensions may also be introduced. These unintended side effects may result in the cohort no longer being representative of the larger population intended to be studied, which can negatively affect the validity of subsequent analyses. We present techniques for selection bias tracking and visualization that can be incorporated into high-dimensional exploratory visual analytics systems, with a focus on medical data with existing data hierarchies. These techniques include: (1) tree-based cohort provenance and visualization, with a user-specified baseline cohort that all other cohorts are compared against, and visual encoding of the drift for each cohort, which indicates where selection bias may have occurred, and (2) a set of visualizations, including a novel icicle-plot based visualization, to compare in detail the per-dimension differences between the baseline and a user-specified focus cohort. These techniques are integrated into a medical temporal event sequence visual analytics tool. We present example use cases and report findings from domain expert user interviews.Comment: IEEE Transactions on Visualization and Computer Graphics (TVCG), Volume 26 Issue 1, 2020. Also part of proceedings for IEEE VAST 201

    Benchmarking the way cities and regions around the world are responding to the global recession

    Get PDF
    September 2009The Gauteng Provincial Government Department of Economic Department (GPGDED) approached the Gauteng City-Region Observatory (GCRO) to provide them with a fast turn-around report that benchmarks sub-national responses to the economic crisis globally. The brief was to provide a review of what cities and regions are doing in response to the crisis in other parts of the world, and to emphasise the action side of the story – what is being done, rather than analysing the differing nature and impact of the crisis in different places.The report is filled not so much with specific recommendations as a suite of possible interventions that the Gauteng Provincial Government may wish to choose from and implement.For the Gauteng Provincial Department of Economic Developmen

    State of the Gauteng City-Region review 2011

    Get PDF
    This report is associated with an online interactive website which provides the State of the Gauteng City Region review 2011 in full. Link to http://2011.legacy.gcro.unomena.net/This 'State of the GCR' Review aims to contribute to ideas around how to build an integrated, sustainable and globally competitive city-region which provides more equal opportunities and a better quality of life for all its residents. The Review offers image- and map-rich representations of the considerable datasets and information that GCRO has collected and produced on the GCR, providing an overview of the key dynamics and trends affecting the economy, society, governance and environment of a city-region that is predicted to be the twelfth largest in the world by 2015. The State of the GCR is intended as both an information base and a platform for debate for all stakeholders in the region – government, business, academics and residents – around how to build on the region’s advantages and address its challenges, including rapid urbanisation and migration, poverty, and unequal distribution of wealth. GCRO’s 2011 State of the GCR Review was formally launched on Monday 17 October 2011. A second review, State of the GCR Review 2013, was launched in October 2013.Gauteng City-Region Observatory: the city-region review 2011 © GCRO / Authors: David Everatt, Graeme Gotz, Annsilla Nyar, Sizwe Phakathi and Chris Wray with editorial support from Maryna Storie. Conceptual design and execution ITL Communication & Design./The GCRO is a partnership of the University of Johannesburg, the University of the Witwatersrand, Johannesburg, and the Gauteng Provincial Government

    Human-Computer Collaboration for Visual Analytics: an Agent-based Framework

    Full text link
    The visual analytics community has long aimed to understand users better and assist them in their analytic endeavors. As a result, numerous conceptual models of visual analytics aim to formalize common workflows, techniques, and goals leveraged by analysts. While many of the existing approaches are rich in detail, they each are specific to a particular aspect of the visual analytic process. Furthermore, with an ever-expanding array of novel artificial intelligence techniques and advances in visual analytic settings, existing conceptual models may not provide enough expressivity to bridge the two fields. In this work, we propose an agent-based conceptual model for the visual analytic process by drawing parallels from the artificial intelligence literature. We present three examples from the visual analytics literature as case studies and examine them in detail using our framework. Our simple yet robust framework unifies the visual analytic pipeline to enable researchers and practitioners to reason about scenarios that are becoming increasingly prominent in the field, namely mixed-initiative, guided, and collaborative analysis. Furthermore, it will allow us to characterize analysts, visual analytic settings, and guidance from the lenses of human agents, environments, and artificial agents, respectively

    Periphery Plots for Contextualizing Heterogeneous Time-Based Charts

    Full text link
    Patterns in temporal data can often be found across different scales, such as days, weeks, and months, making effective visualization of time-based data challenging. Here we propose a new approach for providing focus and context in time-based charts to enable interpretation of patterns across time scales. Our approach employs a focus zone with a time and a second axis, that can either represent quantities or categories, as well as a set of adjacent periphery plots that can aggregate data along the time, value, or both dimensions. We present a framework for periphery plots and describe two use cases that demonstrate the utility of our approach.Comment: To Appear in IEEE VIS 2019 Short Papers. Open source software and other materials available on github: https://github.com/PrecisionVISSTA/PeripheryPlots Video figure available on Vimeo: https://vimeo.com/34967814

    A Survey on Visual Analytics of Social Media Data

    Get PDF
    The unprecedented availability of social media data offers substantial opportunities for data owners, system operators, solution providers, and end users to explore and understand social dynamics. However, the exponential growth in the volume, velocity, and variability of social media data prevents people from fully utilizing such data. Visual analytics, which is an emerging research direction, ha..
    • …
    corecore